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Abstract 

We describe estimators Xn(Xo, X\, . . . , X n ), which when applied 
to an unknown stationary process taking values from a countable al- 
phabet X , converge almost surely to k in case the process is a k-th 
order Markov chain and to infinity otherwise. 
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1 Introduction 



When faced with an unknown stationary and ergodic stochastic process 
Xi, X 2 , . . . , X n , . . . one may try to determine various properties of this pro- 
cess from the successive observations up to time n. For example, one might 
try to estimate the entropy of the process. Several schemes of the form 
g n (Xi, . . . ,X n ) are known which will converge almost surely to the entropy 
of the process {X n } cf. Bailey [1J, Csiszar and Shields [2], Csiszar [3], Orn- 
stein and Weiss [8], [7J, [9], Kontoyiannis, Algoet, Suhov and Wyner [6] 
and Ziv [10]. However, if one just wants to determine whether or not the 
process has positive entropy (often associated with the popular notion of 
chaos) then there is no sequence of two valued functions e n (X 1; . . . ,X n ) G 
{ZERO, POSITIVE} with the property that almost surely, e n stabilize at 
ZERO for all zero entropy processes and at POSITIVE for all positive 
entropy processes. (While this result does not appear explicitly in Ornstein 
amd Weiss [TJ, it can be readily established using a very simple variant of 
the construction given there in § 4.) 

A similar situation obtains in testing for membership in the class of fc-th 
order Markov chains. One can estimate the order of a Markov chain by e.g 
the method of Csiszar and Shields [2] or Csiszar |3j. They show that the 
minimum description length Markov estimator will converge almost surely 
to the correct order if the alphabet size is bounded a priori. Without this 
assumption they show that this is no longer true. To accomplish their goals 
they study the large scale typicality of Markov sample paths. A further 
negative result is that of Bailey [1] who showed that no two valued test 
exists for testing mixing Markov vs. not mixing Markov. 

We will present a more direct estimator for the order of a Markov chain 
which also uses the fact that there are universal rates for the convergence 
of empirical fc-block distributions in this class. Our approach enables us to 
dispense with the assumption that the alphabet size is bounded, indeed it 
may even be infinite, as long as there is a finite memory. In addition we will 
show that if the process is not a Markov chain then the estimate for the order 
will tend to infinity. This is in complete analogy with the entropy estimation 
that we mentioned earlier. 
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2 The Order Estimator 



Let {X n } ( ^L_ 00 be a stationary and ergodic time series taking values from a 

discrete (finite or countably infinite) alphabet X. (Note that all stationary 

time series {X n }"^ =0 can be thought to be a two sided time series, that is, 

{X n }^ = _ 00 . ) For notational convenience, let X^ = (X m , . . . , X n ) , where 

m < n. Note that if m > n then X£ is the empty string. 

Let p(x°_ k ) and p(y\x°_ k ) denote the distribution P(X°_ k = x°_ k ) and the 

conditional distribution P(X 1 = y\X°_ k = x°_ k ), respectively. 

A discrete alphabet stationary time series is said to be a Markov chain if for 

some K > 0, for all y e X, i > 1 and z°_ K _ i+1 £ X K+ \ if p(z°_ K _ i+1 ) > 

then 

p(y\z-K+i) =p(y\z-K-i+i)- 

The order of a Markov chain is the smallest such K. 

In order to estimate the order we need to define some explicit statistics. 
For k > let Sk denote the support of the distribution of X°_ k as 

S k = {x°_ k E X k+l : p(x\) > 0}. 

Define 

A fc = sup sup p{x\z°_ k+l ) - p(x\z°_ k _ i+1 ] 
^ (sVi+LaOeSfc+i 

r-i-i 

We will divide the data segment Xq into two parts: X 2 and Xpn-| . Let 

«S„ k denote the set of strings with length k + 1 which appear at all in X 2 
That is, 



5$ = A e X k ^ : 3k < t < - 1 : X\_ k = x°_ k }. 

(2) 

For a fixed < 7 < 1 let S^ k denote the set of strings with length k + 1 
which appear more than n 1-7 times in X^ . That is, 

S$ = {*°-* e : #{ r|l + * < * < n : X\_ k = x°_ k } > n^}. 

Let 

c" _ ci 1 ) O c( 2 ) 
°k — °n,k I |°n,fc- 



2 



For notational convenience, let C(x\z°_ k+1 : [711,712]) denote the empirical 
conditional probability of X\ = x given X°_ k+1 = z°_ k+1 from the samples 
(X ni , . . .,X n2 ), that is, 



#{», + * - 1 < t < n 2 - 1 : X' t _ M = z°) 



where 0/0 is defined as 0. 

We define the empirical version of A& as follows: 



Au = max max 

l<i<n( 2 .,i,x)eS",. 



T) T) 

C(x\z°_ k+1 :[\-],n])-C(x\z^ t+1 :[\-],n]) 



Observe, that by ergodicity, for any fixed k, 

liminfA^>A fc almost surely. (1) 



We define an estimate Xn for the order from samples Xq as follows. Let 
< (3 < ^p- be arbitrary. Set xo = 0, and for n > 1 let Xn be the smallest 
< k n < n such that A£ n < n' 13 . 

THEOREM. If the stationary and ergodic time series {X n } taking values 
from a discrete alphabet happens to be a Markov chain with any finite order 
then Xn equals to the order eventually almost surely, and if it is not Markov 
with any finite order then Xn — > 00 almost surely. 

Application: Let M > be arbitrary. The goal is to decide if the discrete 
alphabet stationary and ergodic time series is a Markov chain with order 
less than M or not. One may use \n and say YES if Xn < M and say NO 
otherwise. By the Theorem, eventually, the answer will be correct. 



3 Proof of the Theorem 



Proof: If the process is a Markov chain, it is immediate that for all k greater 
than or equal the order, A& = 0. For k less than the order A& > 0. If the 
process is not a Markov chain with any finite order then A^ > for all k. 
Thus by ([T]) if the process is not Markov then Xn ~~ * 00 and if it is Markov 
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then Xn is greater or equal the order eventually almost surely. We have to 
show that Xn is less or equal the order eventually almost surely provided that 
the process is a Markov chain. 

Assume that the process is a Markov chain with order k. Let n > k. We will 
estimate the probability of the undesirable event as follows: 



P{k n k >n^\X^)< 

n 

y P( max 

i=i ( z -k-i+i' x ) €S k+i 



C(x\z°_ k+1 : [\^},n]) - C(x\z°_ k _ l+1 : [f^.n]) > n^\X^) 



71- 



We can estimate each probability in the sum as the sum of two terms: 



P( max 



C{x\z\ +1 :[\^M)-C{x\z\_ i+l :[\^M) 



71- 



>n-?\Xp ] 



< P( max 

+ P( max 

(z°, .,.,i)e5" . 

V — fc — ' + 1 



.71. 



C{x\z\ +1 :[\-ln\)- P {x\z\ +l ) 



p{Az\ + i)-c{x\z\_ i+1 -.[\^-\M) 



> O.Sn^lXo^ 1 ) 
> O.Sn^lXo^ 1 ). 



We overestimate these probabilities. For any m > and x°_ m define af l (x^_ m ) 
as the time of the i-th ocurrence of the string x°_ m in the data segment -^pn] , 
that is, let a™(x°_ m ) = |~|] + m — 1 and for i > 1 define 



Now 



P( max 

(*V i+ i.*)eSj? + . 

< P( max sup 

(*Vi>-) e <0> nl ~ 7 



J r=l 



> O.Sn^lXo^ 1 ) 



+ P( 



sup 



max 

i+l' 



1 ^ 

IV. 



c} P(£|z°fc + l, 



> 0.5n^|X o r51 ) 
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Since both S^ k and S^' k+i depend solely on X$ 2 1 we get 



P( max 



.n 



n - 



C(x\z<L k+1 : [\-],n]) ~ C(x\z°_ k _ l+1 : [\-],n]) 



< E E p ( 

> O.Sn^lXo^ 1 ) 



1 i 

~E L P 

J r=l 



+ E E p < 

\ — fe— t+l' ' n,fc+« 



1 j 

~EV /. 

J r=l 



-^(xlz 1 : 



fc+i/ 



> O.Sn^lXo 21 ). 



Each of these represents the deviation of an empirical count from its mean. 
The variables in question are independent since whenever the block z°_ k+1 
occurs the next term is chosen using the same distribution p(x\z°_ k+1 ) . Thus 
by Hoeffding's inequality (cf. Hoeffding [5] or Theorem 8.1 of Devroye et. 
al. p[]) for sums of bounded independent random variables and since the 
cardinality of both S^ k and S^ k+i is not greater than (n + 2)/2, we have 



P( max 

(z° , .,,,x)e5™ . 

— k— 8+1 5 J k-\-i 



77 



.n 



In}) 



>n-?\X^ ] ) 



< 2- 



j=\n 



1-7] 



Thus 

Integrating both sides we get 



P(A n k > n~P\X^) < n(n + 2)4e- 2 ^ +1 ~ 7 . 



P{Al > n' 13 ) < n(n + 2)4e" 



The right hand side is summable provided 2/3 + 7 < 1 an d the Borel-Cantelli 
Lemma yields that P(A k < n~@ eventually) = 1. Thus Xn < k eventually 
almost surely provided the process is Markov with order k. The proof of the 
Theorem is complete. 
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